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Introduction 

The measurement of achievement is a critically 
important part of efforts to improve student learning. 
Instructional and evaluative decisions are based on the 
type of formal and informal feedback gleaned from the 
tests used to measure achievement. It is imperative 
that these tests be as technically sound as possible. 

Measurement experts have made great strides in 
providing the technical background necessary to 
accurately and reliably measure achievement. This 
knowledge, however, is not effectively communicated to 
the classroom teacher. Courses in tests and 
measurement are not typically required for teacher 
certification; in-service training or technical support 
for classroom assessment are equally rare, when 
questioned about the tests they develop, teachers 
consistently indicate lack of confidence about their 
effectiveness and validity (Stiggins and Bridgeford, 
1985) . 

Research indicates that teacher-made tests 
dominate the assessments used by teachers, regardless 
of test purpose, grade level, or subject area (Stiggins 
and Bridgeford, 1985) . Reliance on them increases with 
grade level, with greater dependence shown by science 
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and math teachers than teachers in other subject areas. 
Unfortunately, the conclusions of research on the 
characteristics of teacher-made tests have been 
disappointing. 

Fleming and Chambers' (1983) examination of 
teacher-made tests found fewer than 20% of all items 
writt'^n above the knowledge level of Bloom's Taxonomy 
(see Blcom, Madaus, and Hastings, 1981). The majority 
of items were written in a short answer format. 
Matching, multiple choice, and true-false formats were 
used far less frequently, and essay items were 
virtually nonexistent. Other troublesome 
characteristics such as ambiguous items, grammatical 
errors, and lack of directions were found to be quite 
common . 

The findings of research on teachers' use of post- 
hoc analyses of test results has been quite 
disconcerting. In their survey of elementary, junior 
high, and high school teachers, Gullickson and Ellwein 
(1985) found that few, if any, systematic analyses of 
test results were performed by classroom teachers. 
Without such analyses, there ^ little assurance that 
the tests serve the purposes for which they were 
designed. 
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Valid, reliable, and objective assessment of 
student achievement is imperative. Teacher-made tests 
are the primary tools used in this process, but 
research implies that they are seriously tlawed. The 
purpose of this study was to develop and apply a model 
that can be used in identifying assessment needs at the 
school or district level and offer suggestions as to 
how those needs can be addressed through in-service 
activif as. The model focuses on the identification 
and narrowing of discrepancies between teachers' 
perceptions of their testing practices and actual 
practice. 

Methodology 

All 19 math and 16 science teachers at the four 
senior high schools (9th - I2th grades) in a mixed 
suburban/rural district participated in the project. 
Their involvement consisted of completing a brief 
survey instrument and supplying their most recently 
administered unit or quarter test. Although some 
teachers supplied multiple tests, only one per teacher 
was chosen at random for analysis. One teacher who 
completed the questionnaire failed to supply a test 
copy. Thus, 35 questionnaires and 34 tests containing 
more than 1400 items were included. 
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A rating form (see Appendix A) was developed to 
analyze the sample of teach r-composed tests. Tests 
were rated on four criteria: item format, cognitive 
levels addressed, item quality, and presentation. 
While research relating performance on teacher-made 
tests to other student outcomes is virtually non- 
existent, these criteria were chosen for their 
relationship to the content validity and reliability of 
teacher-made tests. 

The proportion of items written in et.ch format 
(e.g., multiple choice, true-false, matching, short 
answer, or essay) was calculated primarily to ascertain 
the accuracy of teachers' perceptions of their testing 
practices. While the classification of multiple 
choice, true-false, or matching items is 
straightforward, short answer and essay items are 
differentiated less easily. Items were classified as 
short answer if they required the student to respond in 
a single sentence or less. Essay items required 
organized, extended responses. The proportion of 
items written at each cognitive level, rated according 
to a modified version of Bloom's taxonomy, was 
calculated because of its relationship to content 
validity; that is, a valid test must adequately sample 
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both the objectives taught and the levels of knowledge 
expected for each objective, item quality and test 
presentation were assessed for their inextricable tie 
to test reliability. Item quality was estimated by 
reviewing the items written in each format according to 
commonly accepted recommendations («ee, for example, 
Sax, 1989 or Carey, 1988) . Sets of similar item 
formats were then rated on the basis of flaws of anv 
type in more than or less than 20% of the items. 
Presentation was rated on characteristics such as the 
adequacy of instructions, formatting, numbering system, 
and duplication quality. These characteristics were 
measured according to modified Likert scales. 

Where appropriate, inter- rater reliability was 
measured as the percentage of agreement among raters. 
Training was provided to raters, and, based on a sample 
of several tests, reliability coefficients ranged from 
90 to 100 percent. 

The Teacher Testing Questionnaire (soe Appendix B) 
was used to measure teachers- perceptions of their 
testing skills. Items on the questionnaire examined 
the purposes for which the tests were used, teachers' 
perceptions of their test items (i.e., levels of 
knowledge tested, use of item formats, etc.) and their 
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general testing practices, the analyses performed on 
test results, and the confidence teachers' had in their 
test development skills. 

Results 

Nature of Classroom Assessment 

There is conclusive evidence that teachers 
porceive summative evaluation as the dominant purpose 
of classroom testing and that teacher-made tests are 
the most important source of information upo.i which 
this evaluation is based. Teachers indicated that more 
than 70% of all tests were administered for the primary 
purpose of assigning grades. They also reported using 
test results in formative manners, often to identify 
student weaknesses and modify instruction, but such use 
is secondary to the purpose of assigning grades. 

While formative and summative use of test results 
is commendable, the tremendous weight placed on 
teacher-made tests in student evaluation underscores 
the need to ensure that these tests are valid and 
reliable indicators of student performance. In fact, 
when asked to rank in importance the sources from which 
they obtained the information needed to evaluate 
students, 31 of the 35 teachers ranked their own tests 
first. Classroom participation and feedback obtained 
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from instruction were ranked next in importance. 
Interestingly, standardized tests were ranked below 
classroom behavior by 28 teachers! 
Characteristics of Teacher-Mad^ Tests 

general Characte ristics . More than 1,400 items 
from 35 tests were examined. Teachers reported writing 
about 65% of these items. The remaining items were 
obtained from test guides, textbooks, workbooks, and 
other sources. The number of items per test varied 
widely from a minimum of 14 to a maximum of 103 items. 
On average, there were 42 items per test. With few 
exceptions, all tests were judged to be of reasonable 
length . 

Twenty-four of the tests (70%) were completely 
type-written, two (6%) contained typed and hand-written 
sections, and 8 (24%) were totally hand-written. In 
only four cases was duplication quality judged to be 
inadequate. Formatting was a problem in more than 70% 
of the tests. Common examples of this deficiency were 
crowding, inconsistent style and margins, and lack of 
space for answers. 

Written instructions were provided on 25 of the 34 
tests (74%). All but two of these contained 
instructions for the total test as well as subsections. 
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Nine tests (26%) contained no instructions, despite the 
fact that teachers reported on the questionnaire nearly 
always including written instructions for each 
subsection. 

Instructions were deemed "nebulous" for 21 of the 
25 tests (84%) that included written instructions. 
"Nebulous" referred to instructions such as those that 
ask students to choose an answeif without indicating hot^ 
or where the choice should be recorded. This was 
particularly problematic for snatching items where two 
long lists were often presented with no space provided 
for answers. The student was left to decide whether to 
match Column A to Column B, Column B to Column A, or 
draw lines between the two. 

Students were not typically informed of the point 
value of any test or Hem. None of the 34 tests 
contained a written explanation of the weight of that 
test in determining a student's grade. The point value 
of individual items or sections was specified in only 
six tests (18%). Ironically, teachers reported 
frequently informing students of item values. Unless 
this information was verbalized to the students prior 
to testing, there is little evidence to support such a 
contention. 
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Item Formats , The results of the analyses 
pertaining to item tormats are reported in Figure l. 
According to the results observed by the researchers, 
more than 50% of all items were written in a short 
answer format. Multiple choice, matching, and true- 
false formats accounted for about 20%, 15%, and 5% of 
all items respectively. The most striking observation 
was the inclusion of only 4 essay items among the more 
than 1400 items examined. 

As shown in Figure l, similarities were found 
between the percentages observed by the researchers and 
those perceived by the teachers for multiple choice, 
true-false, and matching formats. Teachers, however, 
perceived themselves writing far fewer short answer 
items and many more essay items than were observed. 

Insert Figure l about here 

Teachers did not routinely weight item formats 
differently. As revealed in Table i, the self-reported 
percentage of a student's score determined by any one 
item format paralleled the percentage of items written 
in that format. 



Insert Table 1 about here 
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Cognitive Levels Tested . Teachers agree that the 
vast majority of itfems were written at the lower 
cognitive levels of knowledge and comprehension (Bloom, 
Madaus, & Hastings, 1981). A major discrepancy lies, 
however, in the perceived percentage of items written 
at higher levels. Although teachers reported that 
roughly one-fourth of all items were written at the 
application, analysis, synthesis, or evaluation level, 
the researchers' analyses placed less than 8% of all 
items at these cognitive levels, with virtually no 
items requiring students to synthesize or evaluate (see 
Figure 2) . A i-test of mean differences between 
teacher perceptions of the percentage of items written 
at the levels of synthesis or evaluation and rater 
judgments of percentage of items at these levels was 
statistically significant [t=4.76, e<.001 with 
Bonferroni (Dunn, 1961) correction]. This discrepancy 
confirms Carter's (1984) finding that teachers tend to 
inaccurately classify higher order items. 

Insert Figure 2 about here 

Possible effects of school and subject taught were 
analyzed (see Table 2) . Results indicated that the 
individual school had no efxect on teachers' use of 
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higher level test items. However, the subject - math 
or science - did significantly affect the percentage of 
items judged to be written at che knowledge and 
comprehension levels. No differences by subject were 
found at other cognitive levels. Although teachers of 
both disciplines wrote the majority of items at these 
rwo lower levels, math teachers included significantly 
greater numbers of comprehension items on their exams. 
While the science tests analyzed contained, on average, 
78% of all items at the knowledge level and 17% at the 
comprehension level, math tests had an average of 78% 
of all items written at the comprehension level with 
about 13% at the knowledge level. This finding can be 
attributed to the tendency to test math skills by 
requiring students to solve number problems 
(comprehension level). 

Insert Table 2 about here 

The finding of major importance here is not the 
differences by subject at the lower levels of 
knowledge, but the lack of items in either subject at 
higher levels. Interestingly, few math teachers 
required students to apply knowledge of procedures to 
new situations. Word problems were regretfully 
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scarce • 

Quality of Items . Grammatical errors discovered 
by the raters were few in number, but other item 
writing fla\fc;5 were quite common across all formats • Of 
the four essay items examined, all contained major 
fla\s. None contained information to guide the student 
in structuring a response or tapped higher level 
thinking skills. Of the 18 tests containing multiple 
choice items, 17 were judged to have flaws in more than 
20% of these items. Common errors were more than one 
correct answer, posing the question in the distracters 
rather than the stem, grammatical inconsistencies 
between the stem and distracters, inappropriate use of 
"all of the above" and "none of the above", and asking 
mor^ than one question in the stem. Similar results 
were obtained for tests containing matching items, with 
perfect matching and the lack of response guidelines 
the most common deficiencies in this format. A sliyht 
improvement was observed for short answer and true- 
false items, but errors were still found in more than 
half of the tests containing items of these types. Of 
these, the most consistent were the use of vague stems, 
multiple blanks within an item, and the use of 
negative statements. 
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Teachers ' Confidence in Testing Skills , 

Teachers were asked to respond on a five point 
Likert scale (1 equals "strongly disagree" to 5 equals 
"strongly agree") as to how confident they felt about 
their testing skills. They reported, on average, 
feeling confident in their ability to construct valid 
and reliable tests (M=4.40) and assess the validity and 
reliability of those tests (M=4.29). They tended to 
rate their pre-service training in tests and 
measurement as adequate (M=3.71) and were only slightly 
less assured of the adequacy of their in-service 
training (M=3.49). 

Teachers reported routinely practicing some 
commonly accepted test development procedures (see 
Table 3) . They indicated that tests werj almost always 
based on instructional objectives, that objective 
scoring procedures were used, and that test results 
were reviewed with students. 

Insex^t Table 3 about here 

It appears that teachers do not consistently use 
tcbles of specifications to cc istruct their tests, nor 
do they empirically analyze item or test level data 
(see Table 3). Response data indicates they only 
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occasionally tally the number of Items per 
instructional objective or per level of knowledge, 
compute item statistics, eliminate items on the basis 
of ite^ statistics, or compute an arithmetic mean for 
the test. As indicated by the. own responses, 
teachars appear confident in their knowledge about 
these empirically related practices, but they tend to 
disregard them when analyzing their own tests. 

Discussion 

Utilizing test analysis guides and self-report 
questionnaires art excellent means of identifying 
misconceptions and concerns of classroom teachers 
regarding their own tests. Some caveats related to 
these analyses are in order here. First, test ratings 
should not be considered as a means of teacher 
evaluation. Many decisions will be subjective, and the 
quality of one or two tests will not necessarily be 
indicative of all tests composed by that teacher. 

Secondly, it should be recognized that test 
quality is usually a reflection of training, not 
ability. Teachers can be taught to construct valid and 
reliable tests, but such instruction and subsequent 
test development are time-consuming. Even with proper 
training, teachers may not find the time to perform the 
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recommended test analyses. They must, therefore, be 
presented time-saving tips that are easily understood 
and implemented. 

The most difficult area to address will be 
development of items testing higher order thinking 
skills. Developing and classifying such items require 
practice. An item that appears to be a higher order 
item may, because of the instruction provided, actually 
be written at the level of knowledge or comprehension. 
Thus, raters must recognize that classification of 
iters is occasionally subjective. Illustrative 
examples of Bloom's Taxonomy such as t^^ose offered in 
Bloom, Madaus, and Hastings (1981) or Gronlund (1990) 
and alternative taxonomies like that offered by 
Quellmalz (1985) are helpful in reducing such 
subjectivity. Finally, teachers should be advisrsd to 
avoid classifying items based on format; all essay 
items are not higher order items! 

Recommendations 

The Test Analysis Guide and Teacher Testing 
Questionnaire (Appendices A and B) are useful in 
providing in-service activities that are tailored to 
the needs of the school or distri ^t. our results 
suggested four topics for inclusion in such activities. 
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First, the preponderance of knowledge and comprehension 
level items and the failure of teachers to map test 
items to targeted objectives suggest a need to provide 
instruction in the use of tables of specifications. 
Opportunity for practice in constructing the table and 
mapping items can greatly improve the content validity 
of teacher-made tests. 

A related weakness was noted in the lack of items 
addressing higher order skills. Writing items at the 
levels of analysis, synthesis and evaluation requires 
practice. In-service workshops can provide the 
mechanism through which teachers of the same discipline 
could help one another in developing higher order 
items. Measurement textbooks such as Gronlund's (1990) 
provide descriptions of the major categories of the 
cognitive domain, illustrative objectives and suggested 
vej.bs to use in stating student outcomes. An excellent 
source for tips in writing higher order items is Sax^s 
(1989) text. With practice, teachers can write items 
that cover a breadth of content and learning outcomes. 

A third in-service activity should address 
weaknesses in specific item formats. Again, a basic 
measurement text will provide guidelines. As an 
example, multiple choice items should present the 
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problem or question in the stem. The correct answer 
should not be discernible from its length in relation 
to the distracters or from language association with 
the stem. Guided practice, perhaps using some of their 
own items, will help teachers recognize and correct 
such common errors that compromise item reliability. 

Finally, the empirical analysis of test results 
should be discussed, but it is most important here to 
recognize that this is the phase that the already 
overburdened classroom teacher is least likely to 
incorporate into routine testing practices. Although 
computerized analyses are possible, analysis of any 
detail requires entering individual student responses 
into the computer either directly or through optical 
scan readers. This process can be simplified with 
software designed for use by classroom teachers 
(Oescher and DeGolyer, 1989) . Of course, objective 
tests are required. Although computerized analysis is 
to be encouraged where possible, simple techniques to 
estimate item and test reliability can be mastered 
quickly. Again, suggestions are offered by Gronlund, 
(1990), Sax (1989), and Carey (1988) that will greatly 
reduce the time involved in item and test analyses. 

In concluding, it should be emphasized that the 
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primary purpose of analyzing teacher testing practices 
is to inform the process of training for improvement. 
When the stakes for students are admittedly high, the 
tools of evaluation must be above reproach. 
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Table 1 

Teachers ' Self-reported Use of Item Formats 



Multiple True- Short 
Choice Fal se Matching Answer Essay 



Average 
percentage of 

items written 15.46 8.31 15.46 40.23 20.69 

in each format 



Average 
percentage of 

score obtained 13.68 8.12 14.12 41.76 22.32 

from each format 
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Table 2 

MQVA Summary Tables for tjie Effect of Subject Taught qq the 
Percentage of items Observed at Knowledge and Comprehension 
Levels 



Dependent Variable: Knowledge 





d£ 


SS. 


£ 


Sub j ect 


1 


3.49 


94.04* 


Error 


32 


1.39 




Total 


33 


4.88 




*E < .ClOl 


Dependent Variable: 


Comprehension 






df 


SS. 


£ 


Subject 


1 


2.84 


74.63* 


Error 


32 


1. 57 




Total 


33 


4.41 





*E < .0001 



Note. The mean percentag« of items written at the 
knowledge level was 78.2 for science (N=15) and 12.5 
for math (N=19) . The mean percentage at the 
comprehension level was 16.8 for science and 77.5 for 
math. 
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Table 3 

Beported Testing Practices o£ 31 Math and Science Teachers 



Item 



My tests are based on my 4.85 .36 

instructional objectives 

I tally the number of items 3.31 l.ie 

intended to measure each 
instructional objective 

I tally the number of items 2.97 1.15 

intended to measure each level 
of student performance 

I include written instructions 4.60 .91 

for each section of my tests 

My students are informed of the 4.06 1.00 

point value of each test item 

I complete an answer key for 4.80 .63 

each objective item before 
scoring tests 

I write out an appropriate or 4.38 
desired response for each essay 
item before scoring these items 

Scores on my tests are adjusted 1.76 
for guessing 

I assign the point values for 3.21 1.01 

individual items before 
correcting all tests 

I compute item analysis 2.36 .90 

information for my tests 



1.15 



1.23 
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Table 3 (continued) 



I eliminate certain Items 2.42 .61 
In determining test scores 

I compute an arithmetic mean 2.63 1.21 
of scores received by students 
for each test 



Note. Entries represent scores on the following 

modified five-point Likert Scale: 1 = Never, 
2 = Seldom, 3 = Sometimes, 4 = Frequently, 
and 5 = Always. 
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lESI ANALYSIS GUIDE 



Ua IMFOHMATIOM 
ITEM FORMAT 



KNOWLEDGE LEVEL TARGETED 



AUTHOR 



GENERA!, TEST INFORHATIOM 
GRAMMATICAL ERRORS 

NUMBERING SYSTEM 



VALUE OF OVERALL TEST 
INDICATED 

VALUE OF INDIVIDUAL 
ITEMS INDICATED 

TEXT PRESENTATION 



I^LTIPLE CHOICE 
2«TRUE/FALSE 
3«MATCHING 
4«SH0RT ANSWER 
5«ESSAY 

6«CH0ICE OF ESSAYS 
Totals 

1'KNOWLEDGE 

2>C0NPREHENSI0N 

3>APPLICATI0N 

^'ANALYSIS 

5»SYNTHESIS 

f ^EVALUATION 

lOUlt 

INDEFINITELY FROM TEXT OR WORKBOOK 
2ePR0BABLY FROM TEXT OR WORKBOOK 
3«PR0BABLY TEACHER-COMPOSED 
Totals 

SCORING KEY 

1«MANY 

2=FEW 
3-NONE 

1- NOT LOGICAL 

2- LOGICAL 

IsNO 
2"YES 

UNO 
2«YES 

UHAND-WRITTEN 
2>TYPEWRITTEN 
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INCLUSION OF INSTRUCTIONS 



Appendix A (continued) 
1=N0 INSTRUCTIONS 

2«INSTRUCTI0NS FOR OVERALL TES^ ONLY 
3>INSTRUCTI0NS FOR OVERALL TESl AND 
SUBSECTIONS 



QUALITY OF INSTRUCTIONS 



1>N0T APPLICABLE (NO INSTRUCTIONS) 

2-NEBULOUS 

S'EXPLICIT 



QUALITY OF DUPLICATION 



I^UNREADABLE 

2«READABLE WITH DIFFICULTY 
3«READABLE 



QUALITY OF PRESENTATION 



leUNFORNATTED 
2sPARTIALLY FORMATTED 
3sF0RNATTED 



QUALITY OF MULTIPLE CHOICE 
ITEMS 



1*N0T APPLICABLE (NO ITEMS OF THIS TYPE) 
2«ERR0RS IN MORE THAN 20X OF ITEMS 
3«ERR0RS IH LESS THAN 20X OF ITEMS 
4>N0 ERRORS 



QUALITY OF TRUE/FALSE 
ITEMS 



1>N0T APPLICABLE (NO ITEMS OF THIS TYPE) 
2'ERRORS IN MORE THAN 20X OF ITEMS 
3>ERR0RS IN LESS THAN 20X OF ITEMS 
4>N0 ERRORS 



QUALITY OF MATCHING ITEMS 



1-NOT APPLICABLE (NO ITEMS OF THIS TYPE) 
2»ERR0RS IN MORE THAN 20X OF ITEMS 
3»ERR0RS IN LESS THAN 20X OF ITEMS 
4*H0 ERRORS 



QUALITY OF SHORT ANSWER 
ITEMS 



IsNOT APPLICABLE (NO ITEMS 0¥ THIS TYPE) 
2»ERR0RS IN MORE THAN 20X OF mHS 
3>ERR0RS IN LESS THAN 20X OF ITEMS 
4«N0 ERRORS 



QUALITY OF ESSAYS 



UHOl APPLICABLE (NO ITEMS OF THIS 'YPE) 
2*ERR0RS IN MORE THAN 20X OF ITEMS 
3SERR0RS IN LESS THAN 20X OF ITEMS 
4-NO ERRORS 
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Appendix B 



TEMTHEK TESTING QUEST lONNAIKE 



Please respond completely and truthfully to the following items. 
Your responses will be confidential. 



Name . School 



Grade leveUs) you teach Subject(t) you teach 



A. Rank order the follouing accoixJine to the relative enfAasis you place on each In 
evaluating student achieveamt In your claaitea. Assign "1" to the aoat heavily 
weighted Measure. '2« to the 2nd anst la|)ortant, etc. 



Standardized tests 
Teacher -made tests 

Feedback obtained during Instruction 
Classroom participation and effort 
Individual behavior (conduct) 



o 23 
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Appendfx B (continued) 

Althouoh you ny UM varyino kincte of tests in your classes, respond to the 
following itew considering the aenerel characteristics of all of the tests 
thrt xsu choose for classroosi use. Do not consider tests aiteinistered by the 
school systesi. 

B. Assign percentages to each category in the foUouing iteaa. Be sure that each 
SIMS across to 100X. 

Iteaa 1-3 refer to the actual test itcas you use. 



1, 



X written by you X obtained from texts, X obtained from other 

workbooks, etc. sources (explain) 



Total 



100X 



X requiring 
student to 
recall facts 
terms, rules, 
or principles 



X requiring 
student to 
demonstrate 
understanding 
by using 
procedures 



X requiring 
student to 
apply rules or 
principles to 
new or unfamiliar 
situations 



X requiring 
student to 
synthesize 
prior learning 
in order to 
analyze and 
evaluate new 
material 



Total 



3. 



100X 



X Multiple choice 
items 



X True/False 
items 



X Matching 
items 



XShort answer/ 
fill-in items 



X Essay 
items 



Total 



100X 
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AppeixJU B (continued) 
ItcM 4-5 refer to your analysit and use of test results 



4. 



5. 



X of score 


X of score 


X of score 


X of score 


X of score Total 


obtained from 


obtained from 


obtained from 


obtained from 


obtafned from 


multiple choice 


true/false 


matching items 


short answer 


essay Items 


Items 


Items 




Items 




100X 




X of tests 


X of tests 


X of tests 


X of tests 


X of tests Total 


used mainly 


used mainly 


used mainly 


used mainly 


used mainly 


for diagnostic 


for placement 


for assigning 


to evaluate 


to reinforce 


purposes 


of students 


student grades 


Instruction 


Instruction 



100X 



C. Ck) • scale of 1 to 5 where 1 = never and 5 > aluays. Indicate your response to each 
of the fol lowing itcw fay circling the correspondlr^ nuriier. 

6. Ny tests are based on my Instructional objectives. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

7. When composing a test, I tally the number of items Intended to measure each 
Instructional objective. 

12 3 4 5 

Never Seldom Sometimes Frequently Always 

8. When composing a test, I tally the number of Items Intended to measure each level of 
student performance (eg. recall, understanding, etc.) 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 
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Appendix B (continued) 

9. My tests are hand- writ ten. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

10. I include written instructions for each of the sections of my tests. 

1 2 3 4 S 

Never Seldom Sometimes Frequently Always 

11. My students are informed of the point value of each item on my tests. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

12. I complete an answer key for objective Items before scoring my tests. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

13. I write out an appropriate or desired response for each essay item before scoring 
these items on uif tests. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

14. Scores on my tests are adjusted for guessing. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

15. I assign the point values of individual test items after correcting all tests. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

16. I assign test grades based on how well students perform relative to others in the 
group (norm* referenced perspective). 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 
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Appendix B (continued) 

17. I assign test fjrades based on mastery content regardless of performance of others 



in the group (criterion- referenced perspective). 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

18. I compute item analysis Information for my tests. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

19. 1 eliminate certain items In determining test scores. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

20. 1 compute an arithmetic mean of the scores received by students for each test. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

21. 1 review tests with students after adninistering and scoring them. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

22. 1 use test results to Identify student weaknesses. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

23. I revise my instructional plans based on test results. 

1 2 3 4 5 

Never Seldom Sometimes Frequently Always 

24. 1 assign remedial or supplemental work to Individual students based on test results. 

1 2 3 4 5 ' 

Never Seldom Sometimes Frequently Always 
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Appendix B (continued) 

0. (k%m scale of 1 to 5 Oiere 1 « stronoly disagree and 5 « ttror^ly agree, indicate 
your response to each of the following itcas by circling the corresponding nmber. 

25, 1 feel confident in my ability to construct valid and reliable tests, 

1 2 3 4 5 

Strongly disagree Disagree Not sure Agree Strongly agree 

26. I feel confident In my ability to assess the reliability and validity of my tests, 

1 2 3 4 5 

Strongly disagree Disagree Not sure Agree Strongly agree 

27, I received adequate pre-service training in testing and student evaluation. 

1 2 3 4 5 

Strongly disagree Disagree Not sure Agree Strongly agree 

28. I received adequate In-service training In testing and evaluation. 

12 3 4 5 

Strongly disagree Disagree Not sure Agree Strongly agree 



THANK YOU FOR YOUR PARTICIPATION! I ■ I 
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Figure 1: A comparison of reported and observed percentages 
of items written in each format. 



Units: Perceniagea 
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Figure 2: A comparison of reported and observed percentages 
of items addressing each level of knowledge* 



Units: Percentages 
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